Córdoba Province
CLIRudit: Cross-Lingual Information Retrieval of Scientific Documents
Valentini, Francisco, Kozlowski, Diego, Larivière, Vincent
Cross-lingual information retrieval (CLIR) helps users find documents in languages different from their queries. This is especially important in academic search, where key research is often published in non-English languages. We present CLIRudit, a novel English-French academic retrieval dataset built from Érudit, a Canadian publishing platform. Using multilingual metadata, we pair English author-written keywords as queries with non-English abstracts as target documents, a method that can be applied to other languages and repositories. We benchmark various first-stage sparse and dense retrievers, with and without machine translation. We find that dense embeddings without translation perform nearly as well as systems using machine translation, that translating documents is generally more effective than translating queries, and that sparse retrievers with document translation remain competitive while offering greater efficiency. Along with releasing the first English-French academic retrieval dataset, we provide a reproducible benchmarking method to improve access to non-English scholarly content.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Singapore (0.04)
- (21 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Bringing Federated Learning to Space
Kim, Grace, Svoboda, Filip, Lane, Nicholas
Abstract-- As Low Earth Orbit (LEO) satellite constellations rapidly expand to hundreds and thousands of spacecraft, the need for distributed on-board machine learning becomes critical to address downlink bandwidth limitations. Federated learning (FL) offers a promising framework to conduct collaborative model training across satellite networks. Realizing its benefits in space naturally requires addressing space-specific constraints, from intermittent connectivity to dynamics imposed by orbital motion. This work presents the first systematic feasibility analysis of adapting off-the-shelf FL algorithms for satellite constellation deployment. We introduce a comprehensive "space-ification" framework that adapts terrestrial algorithms (FedA vg, FedProx, FedBuff) to operate under orbital constraints, producing an orbital-ready suite of FL algorithms. We then evaluate these space-ified methods through extensive parameter sweeps across 768 constellation configurations that vary cluster sizes (1-10), satellites per cluster (1-10), and ground station networks (1-13). Our analysis demonstrates that space-adapted FL algorithms efficiently scale to constellations of up to 100 satellites, achieving performance close to the centralized ideal. Multi-month training cycles can be reduced to days, corresponding to a 9X speedup through orbital scheduling and local coordination within satellite clusters. These results provide actionable insights for future mission designers, enabling distributed on-board learning for more autonomous, resilient, and data-driven satellite operations. Low Earth Orbit (LEO) satellite constellations are expanding rapidly, supporting applications in Earth observation (EO), telecommunications, and navigation. Large-scale constellations such as Planet Labs' Dove fleet, SpaceX's Starlink, and Amazon's Project Kuiper already consist of hundreds to thousands of spacecraft, representing some of the largest distributed systems ever deployed. This unprecedented scale is driving a dramatic increase in the volume and diversity of space-based data. Earth observation missions in particular bear the brunt of this data challenge. High-resolution missions such as Landsat-8 produce 1.8 GB per scene and more than 400 TB annually [1]. At constellation scale, Planet Labs' fleet of over 200 satellites generates terabytes of imagery each day [2].
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Oceania > Australia > Northern Territory > Alice Springs (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (11 more...)
- Aerospace & Defense (1.00)
- Information Technology (0.67)
- Telecommunications (0.67)
Short-Term Regional Electricity Demand Forecasting in Argentina Using LSTM Networks
This study presents the development and optimization of a deep learning model based on Long Short-Term Memory (LSTM) networks to predict short-term hourly electricity demand in Córdoba, Argentina. Integrating historical consumption data with exogenous variables (climatic factors, temporal cycles, and demographic statistics), the model achieved high predictive precision, with a mean absolute percentage error of 3.20\% and a determination coefficient of 0.95. The inclusion of periodic temporal encodings and weather variables proved crucial to capture seasonal patterns and extreme consumption events, enhancing the robustness and generalizability of the model. In addition to the design and hyperparameter optimization of the LSTM architecture, two complementary analyses were carried out: (i) an interpretability study using Random Forest regression to quantify the relative importance of exogenous drivers, and (ii) an evaluation of model performance in predicting the timing of daily demand maxima and minima, achieving exact-hour accuracy in more than two-thirds of the test days and within abs(1) hour in over 90\% of cases. Together, these results highlight both the predictive accuracy and operational relevance of the proposed framework, providing valuable insights for grid operators seeking optimized planning and control strategies under diverse demand scenarios.
- South America > Argentina > Pampas > Córdoba Province > Córdoba (0.24)
- Europe > Italy (0.14)
- North America > United States > Florida (0.14)
- (12 more...)
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation
Villa-Cueva, Emilio, Bolatzhanova, Sholpan, Turmakhan, Diana, Elzeky, Kareem, Ademtew, Henok Biadglign, Aji, Alham Fikri, Araujo, Vladimir, Azime, Israel Abebe, Baek, Jinheon, Belcavello, Frederico, Cristobal, Fermin, Cruz, Jan Christian Blaise, Dabre, Mary, Dabre, Raj, Ehsan, Toqeer, Etori, Naome A, Farooqui, Fauzan, Geng, Jiahui, Ivetta, Guido, Jayakumar, Thanmay, Jeong, Soyeong, Lim, Zheng Wei, Mandal, Aishik, Martinelli, Sofia, Mihaylov, Mihail Minkov, Orel, Daniil, Pramanick, Aniket, Purkayastha, Sukannya, Salazar, Israfel, Song, Haiyue, Torrent, Tiago Timponi, Yadeta, Debela Desalegn, Hamed, Injy, Tonja, Atnafu Lambebo, Solorio, Thamar
Translating cultural content poses challenges for machine translation systems due to the differences in conceptualizations between cultures, where language alone may fail to convey sufficient context to capture region-specific meanings. In this work, we investigate whether images can act as cultural context in multimodal translation. We introduce CaMMT, a human-curated benchmark of over 5,800 triples of images along with parallel captions in English and regional languages. Using this dataset, we evaluate five Vision Language Models (VLMs) in text-only and text+image settings. Through automatic and human evaluations, we find that visual context generally improves translation quality, especially in handling Culturally-Specific Items (CSIs), disambiguation, and correct gender marking. By releasing CaMMT, our objective is to support broader efforts to build and evaluate multimodal translation systems that are better aligned with cultural nuance and regional variations.
- Asia > India (0.04)
- South America > Argentina > Pampas > Córdoba Province > Córdoba (0.04)
- North America > Mexico > Jalisco (0.04)
- (23 more...)
Shedding Light on Dark Matter at the LHC with Machine Learning
Arganda, Ernesto, Rios, Martín de los, Perez, Andres D., Roy, Subhojit, Seoane, Rosa M. Sandá, Wagner, Carlos E. M.
We investigate a WIMP dark matter (DM) candidate in the form of a singlino-dominated lightest supersymmetric particle (LSP) within the $Z_3$-symmetric Next-to-Minimal Supersymmetric Standard Model. This framework gives rise to regions of parameter space where DM is obtained via co-annihilation with nearby higgsino-like electroweakinos and DM direct detection~signals are suppressed, the so-called ``blind spots". On the other hand, collider signatures remain promising due to enhanced radiative decay modes of higgsinos into the singlino-dominated LSP and a photon, rather than into leptons or hadrons. This motivates searches for radiatively decaying neutralinos, however, these signals face substantial background challenges, as the decay products are typically soft due to the small mass-splits ($Δm$) between the LSP and the higgsino-like coannihilation partners. We apply a data-driven Machine Learning (ML) analysis that improves sensitivity to these subtle signals, offering a powerful complement to traditional search strategies to discover a new physics scenario. Using an LHC integrated luminosity of $100~\mathrm{fb}^{-1}$ at $14~\mathrm{TeV}$, the method achieves a $5σ$ discovery reach for higgsino masses up to $225~\mathrm{GeV}$ with $Δm\!\lesssim\!12~\mathrm{GeV}$, and a $2σ$ exclusion up to $285~\mathrm{GeV}$ with $Δm\!\lesssim\!20~\mathrm{GeV}$. These results highlight the power of collider searches to probe DM candidates that remain hidden from current direct detection experiments, and provide a motivation for a search by the LHC collaborations using ML methods.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > Spain > Galicia > Madrid (0.04)
- Asia > China (0.04)
- (4 more...)
- Energy (0.93)
- Government > Regional Government (0.45)
Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)
Topological localization is a fundamental problem in mobile robotics, since robots must be able to determine their position in order to accomplish tasks. Visual localization and place recognition are challenging due to perceptual ambiguity, sensor noise, and illumination variations. This work addresses topological localization in an office environment using only images acquired with a perspective color camera mounted on a robot platform, without relying on temporal continuity of image sequences. We evaluate state-of-the-art visual descriptors, including Color Histograms, SIFT, ASIFT, RGB-SIFT, and Bag-of-Visual-Words approaches inspired by text retrieval. Our contributions include a systematic, quantitative comparison of these features, distance measures, and classifiers. Performance was analyzed using standard evaluation metrics and visualizations, extending previous experiments. Results demonstrate the advantages of proper configurations of appearance descriptors, similarity measures, and classifiers. The quality of these configurations was further validated in the Robot Vision task of the ImageCLEF evaluation campaign, where the system identified the most likely location of novel image sequences. Future work will explore hierarchical models, ranking methods, and feature combinations to build more robust localization systems, reducing training and runtime while avoiding the curse of dimensionality. Ultimately, this aims toward integrated, real-time localization across varied illumination and longer routes.
- Europe > Russia (0.14)
- Europe > Spain > Castilla-La Mancha > Albacete Province > Albacete (0.04)
- Europe > Romania > Nord-Vest Development Region > Cluj County > Cluj-Napoca (0.04)
- (16 more...)
Algorithmic Detection of Rank Reversals, Transitivity Violations, and Decomposition Inconsistencies in Multi-Criteria Decision Analysis
Borda, Agustín, Cabral, Juan Bautista, Giarda, Gonzalo, Irusta, Diego Nicolás Gimenez, Pacheco, Paula, Schachner, Alvaro Roy
Our work focuses on providing a mechanism capable of measuring the performance of a MCDM on a given set of alternatives, with the collateral goal of building a global ranking of the e ffectiveness of di fferent MCDMs. We have implemented these tests within the open-source Scikit-Criteria library, leveraging its RankResult and RanksComparator data structures as fundamental building blocks for comparative ranking analysis. RRT1 systematically evaluates the stability of the optimal alternative when suboptimal alternatives are degraded, employing a controlled mutation strategy and providing comprehensive documentation of the experimental context. This approach provides decision analysts with the following: 1. Quantitative stability assessment: Precise measures of how often methods exhibit rank reversal 2. Sensitivity mapping: Identification of which alternatives and criteria are most prone to instability 3. Method comparison: Objective basis for comparing the robustness of di fferent MCDA approaches 4. Confidence intervals: Statistical bounds on decision reliability through repeated experimentation The algorithm addresses the complications that arise from preprocessing pipelines that can eliminate alternatives, ensuring "graceful degradation" by assigning appropriate worst ranks to maintain completeness.
- South America > Argentina > Pampas > Córdoba Province > Córdoba (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Geneva > Geneva (0.04)
Towards culturally-appropriate conversational AI for health in the majority world: An exploratory study with citizens and professionals in Latin America
Peters, Dorian, Espinoza, Fernanda, da Re, Marco, Ivetta, Guido, Benotti, Luciana, Calvo, Rafael A.
There is justifiable interest in leveraging conversational AI (CAI) for health across the majority world, but to be effective, CAI must respond appropriately within cultur ally and linguistically diverse context s . Therefore, we need ways to address the fact that current LLMs exclude many lived experience s globally . Various advances are underway which focus on top - down approaches and increas ing training data . In this paper, we aim to complement these with a bottom - up locally - grounded approach based on qualitative data collected during participatory workshops in Latin America. Our goal is to construct a rich and human - centred understanding o f: a) potential areas of cultural misalignment in digital health; b) regional perspectives on chatbots for health and c) strategies for creating culturally - appropriate CAI; with a focus on the understudied Latin American context . Our findings show that academic boundaries on notions of cultur e lose meaning at the ground level and technologies will need to engage with a broad er framework; one that encapsulates the way economics, politics, geogr aphy and local logistics are entangled in cultural experience. To this end, we introduce a framework for ' Pluriversal Conversational AI for H ealth ' which allows for the possibility that more relationality and tolerance, rather than just more data, may be called for .
- North America > Central America (0.61)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
- North America > United States (0.14)
- (21 more...)
- Health & Medicine > Health Care Providers & Services (1.00)
- Health & Medicine > Consumer Health (1.00)
- Government (1.00)
- (6 more...)
La Leaderboard: A Large Language Model Leaderboard for Spanish Varieties and Languages of Spain and Latin America
Grandury, María, Aula-Blasco, Javier, Falcão, Júlia, Fourrier, Clémentine, González, Miguel, Martínez, Gonzalo, Santamaría, Gonzalo, Agerri, Rodrigo, Aldama, Nuria, Chiruzzo, Luis, Conde, Javier, Gómez, Helena, Guerrero, Marta, Ivetta, Guido, López, Natalia, Plaza-del-Arco, Flor Miriam, Martín-Valdivia, María Teresa, Montoro, Helena, Muñoz, Carmen, Reviriego, Pedro, Rosado, Leire, Vaca, Alejandro, Vallecillo-Rodríguez, María Estrella, Vallego, Jorge, Zubiaga, Irune
Leaderboards showcase the current capabilities and limitations of Large Language Models (LLMs). To motivate the development of LLMs that represent the linguistic and cultural diversity of the Spanish-speaking community, we present La Leaderboard, the first open-source leaderboard to evaluate generative LLMs in languages and language varieties of Spain and Latin America. La Leaderboard is a community-driven project that aims to establish an evaluation standard for everyone interested in developing LLMs for the Spanish-speaking community. This initial version combines 66 datasets in Basque, Catalan, Galician, and different Spanish varieties, showcasing the evaluation results of 50 models. To encourage community-driven development of leaderboards in other languages, we explain our methodology, including guidance on selecting the most suitable evaluation setup for each downstream task. In particular, we provide a rationale for using fewer few-shot examples than typically found in the literature, aiming to reduce environmental impact and facilitate access to reproducible results for a broader research community.
- North America > Central America (0.60)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (18 more...)
- Law (1.00)
- Education (0.94)
- Information Technology > Security & Privacy (0.94)
- (2 more...)
Revisiting Graph Projections for Effective Complementary Product Recommendation
Anghinoni, Leandro, Zivic, Pablo, Sanchez, Jorge Adrian
Complementary product recommendation is a powerful strategy to improve customer experience and retail sales. However, recommending the right product is not a simple task because of the noisy and sparse nature of user-item interactions. In this work, we propose a simple yet effective method to predict a list of complementary products given a query item, based on the structure of a directed weighted graph projected from the user-item bipartite graph. We revisit bipartite graph projections for recommender systems and propose a novel approach for inferring complementarity relationships from historical user-item interactions. We compare our model with recent methods from the literature and show, despite the simplicity of our approach, an average improvement of +43% and +38% over sequential and graph-based recommenders, respectively, over different benchmarks.
- South America > Brazil (0.04)
- South America > Argentina > Pampas > Córdoba Province > Córdoba (0.04)
- South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
- Research Report (0.84)
- Overview (0.66)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Information Management (0.93)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.89)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)